|
Clustering is the assignment of objects into groups (called ''clusters'') so that objects from the same cluster are more similar to each other than objects from different clusters. Often similarity is assessed according to a distance measure. Clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. Consensus clustering has emerged as an important elaboration of the classical clustering problem. Consensus clustering, also called aggregation of clustering (or partitions), refers to the situation in which a number of different (input) clusterings have been obtained for a particular dataset and it is desired to find a single (consensus) clustering which is a better fit in some sense than the existing clusterings. Consensus clustering is thus the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. When cast as an optimization problem, consensus clustering is known as median partition, and has been shown to be NP-complete. Consensus clustering for unsupervised learning is analogous to ensemble learning in supervised learning.〔 ==Issues with existing clustering techniques== * Current clustering techniques do not address all the requirements adequately. * Dealing with large number of dimensions and large number of data items can be problematic because of time complexity; * Effectiveness of the method depends on the definition of "distance" (for distance based clustering) * If an obvious distance measure doesn’t exist we must "define" it, which is not always easy, especially in multidimensional spaces. * The result of the clustering algorithm (that in many cases can be arbitrary itself) can be interpreted in different ways. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「consensus clustering」の詳細全文を読む スポンサード リンク
|